VReplication: Implement Experimental Parallel Applier#19535
Conversation
This is modeled after MySQL's parallel worker implementation and the default/serial vplayer/vreplicator implementation in Vitess. Signed-off-by: Matt Lord <mattalord@gmail.com>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Signed-off-by: Matt Lord <mattalord@gmail.com>
bef2cfc to
1092f25
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #19535 +/- ##
===========================================
- Coverage 69.67% 62.04% -7.63%
===========================================
Files 1614 120 -1494
Lines 216793 23137 -193656
===========================================
- Hits 151044 14355 -136689
+ Misses 65749 8782 -56967
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Signed-off-by: Matt Lord <mattalord@gmail.com>
b97160e to
5ab9591
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
3635e78 to
90a5739
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
8b8af9e to
c3ea987
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
|
📝 Documentation updates suggested I've created documentation updates for this PR: vitessio/website - VReplication parallel applier docs:
vitessio/vitess - Changelog entry:
🤖 Generated by Promptless |
mhamza15
left a comment
There was a problem hiding this comment.
This looks great! Left just a few comments. The overall design looks sound. One thought I had: to my knowledge, if the current transaction is blocked by a dependency on an in-flight one, future transactions will be blocked right until the current one is unblocked right? Something like:
TX 1: row A // <-- in-flight
TX 2: row A // <-- head of the queue, has a dependency on TX 1, will wait
TX 3: row B // <-- blocked, even though it is not dependent on any in-flight or previous transactions in the queue
If we track each transaction's dependencies, we could allow TX 3 to run in parallel, no?
Signed-off-by: Matt Lord <mattalord@gmail.com>
arthurschreiber
left a comment
There was a problem hiding this comment.
This looks good to me! ❤️
Here's a few things I noticed while reviewing this and trying it locally:
I think there's a bug in the benchmarks that you provided. The benchmark waits for _vt.vreplication.state == 'Running' to be true before stopping the workflow and generating a backlog. It seems that this signal is not actually accurate, which then causes the backlog to be applied by the copy phase.
I think a better check would be to verify the per-table target row count matches source row count for all tables as the signal for the copy phase to have finished.
In my testing (MacBook Pro with M4 Pro), MySQL was consistently the bottleneck, not the parallel vapplier code. I think it would be great if we could run benchmarks on a more powerful machine to see how the new applier scales when not limited by MySQL.
There seem to be a few inefficiencies which don't show up in the benchmarks (again, being limited by MySQL), but might still be good to resolve, because inefficiencies in the vapplier can still cause inefficiencies in other parts of vttablet (e.g. through higher gc pressure etc).
I left comments on the relevant pieces that are part of this code.
There's two allocation specific things I noticed, but they're both not trivial to fix and are not really introduced by this PR - so I'm just writing this down with no expectation of this being fixed here:
-
a lot of allocations are caused by decoding the incoming binlog event stream. That's a more general grpc / protobuf issue we have also in other places, and I believe it's not easily fixable here because the lifetime of the objects is unclear.
-
the other large contributor to allocations I saw was the creation of
strings.Builderstructs inParsedQuery.GenerateQuery. There is aParsedQuery.Appendfunction that allows re-using astrings.Builder, but the harder question to answer is where those builders should live - there's the non-batch, the batched, and the new parallel worker paths that all need to be taken into account.
One more thing I just noticed, which might be simpler to implement:
applyChange in replicator_plan.go generates a per-row bindvar map, and re-computes the before / after field names. The binder map should be reusable (clear it between uses), and the field names should also be static. But I'm also not sure how much this actually costs given that, as pointed out before, MySQL seems to be the bottleneck anyway.
| if row == nil { | ||
| return false | ||
| } | ||
| relevantColumns := make(map[int]struct{}) |
There was a problem hiding this comment.
relevantColumns here seems to be rebuild on every call, and is called twice per row. Isn't the content of this stable across all the rows? If so, we probably should compute this once and reuse the result.
There was a problem hiding this comment.
Alternatively, I think we can iterate plan.PKIndices and the FK columns directly against row.Lengths and check for < 0 without allocating anything at all.
There was a problem hiding this comment.
Yep — it's now cached per table (the same pattern as fieldIdxCache), built once per table per fetch instead of per row change; addressed in 21f86d3. I looked at the fully alloc-free direct iteration too, but the FK-joined columns need a name→index resolution, so the cached set keeps the per-row path to a single Lengths scan.
| fieldIdx[strings.ToLower(f.Name)] = i | ||
| } | ||
| } | ||
| indexes := make([]int, 0, len(plan.IdentityColumns)) |
There was a problem hiding this comment.
The list of indexes on the table should be stable. I think we should be able to pre-compute it, instead of re-computing it for every row.
There was a problem hiding this comment.
(Similar to how fieldIdxCache works).
There was a problem hiding this comment.
Done in 4589d8d — the identity column positions are now resolved once per table and cached alongside fieldIdx, and the per-change path takes them pre-resolved.
| payload := make([]byte, len(writesetTextValueMarker)+len(scratch)) | ||
| copy(payload, writesetTextValueMarker[:]) | ||
| copy(payload[len(writesetTextValueMarker):], scratch[:]) | ||
| writesetDigestAddPayload(d, payload) |
There was a problem hiding this comment.
Could this be optimized to not require allocating payload here?
There was a problem hiding this comment.
Fun one: the allocation benchmark shows escape analysis was already keeping this on the stack — identical 1 alloc/op before and after (that remaining alloc is the xxhash.Digest escaping in the bench harness). 4589d8d switches it to a fixed-size array anyway so the no-alloc property is structural rather than escape-analysis-dependent.
| func writesetDigestInit(d *xxhash.Digest, tableName string) { | ||
| d.Reset() | ||
| d.WriteString(tableName) | ||
| d.Write([]byte{':'}) |
There was a problem hiding this comment.
Should we extract []byte{':'} into a const?
There was a problem hiding this comment.
Go won't let us make a []byte const, but 4589d8d extracts it to a package-level array (writesetKeySeparator) — named and guaranteed alloc-free.
…ication Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
|
|
||
| streamErr := errors.New("stream ended after buffered events") | ||
| cp := dbconfigs.New(&mysql.ConnParams{DbName: testenv.DBName}) | ||
| vse := &Engine{keyspace: testenv.DBName, shard: testenv.DefaultShard, throttledCounts: stats.NewCounter("", "")} |
There was a problem hiding this comment.
This one's safe by design: throttle.Client.ThrottleCheckOK nil-checks its receiver (client.go — if c == nil { return emptyCheckResult, true }), so a bare Engine's nil throttlerClient short-circuits to "not throttled" rather than panicking — which is why this test passes deterministically, including under -race. The other throttle tests initialize throttlerClient because they want throttling (via TestingAlwaysThrottledName), not because nil panics. Added a comment on the fixture so this doesn't keep getting flagged.
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Description
Warning
This feature is experimental
The parallel apply system enables VReplication to apply binlog events using multiple concurrent MySQL connections instead of a single serial connection. When
--vreplication-parallel-replication-workersis set to N > 1, incoming transactions are analyzed for writeset conflicts and dispatched to N worker goroutines, each with its own MySQL connection. Transactions that touch different primary keys run concurrently; transactions that conflict are serialized.The design is inspired by MySQL's own multi-threaded replica applier (MTA), specifically the
WRITESETdependency tracking mode. We recompute writesets on the target side from the row events themselves, rather than relying on the source'sbinlog_transaction_dependency_trackingsetting. This means the parallel applier works regardless of whether the source usesCOMMIT_ORDERorWRITESETdependency tracking — or no dependency tracking at all. That is a critical property as we need to support importing databases of various versions and configurations into Vitess.The design — most notably the in order commits of transactions (see more below) — also supports another critical property which is that it must live alongside the traditional serial applier/vplayer and provide the same external/visible semantics (the position and lag management, metrics, etc). This will be experimental for some time and it needs to be given time to bake before we can even consider fully replacing the serial applier code. We cannot risk destabilizing this critical component or primitive within VReplication because performance without correctness is worthless.
Goroutine Architecture
Four types of goroutines cooperate in a pipeline:
1. scheduleLoop (single goroutine)
The schedule loop reads batched events from the relay log, parses them into logical transactions, builds writesets, and enqueues
applyTxnstructs into the scheduler. It runs on the main goroutine ofapplyEventsParallel.Responsibilities:
relay.Fetch()sequenceNumber,commitParent) from GTID eventsunsavedEvent(bypasses the scheduler entirely)time_throttled2. workerLoop (N goroutines)
Each worker goroutine calls
scheduler.nextReady()to block until a transaction is ready to execute, then applies its row events using the worker's private MySQL connection.Responsibilities:
workerLocalVPlayer) once per worker lifetime, exposing only the fields workers may share (tablePlans, replicatorPlan, serialMu, etc); foreign_key_checks session state is tracked per connection on thevdbClient, initialized when the worker's connections are createdworker.applyEvent(), which rebinds the worker'sdbClient/query/commitonto that local view (the orchestrator's vplayer is never mutated)commitChtxn.doneuntil commitLoop finishes committing (this prevents the worker from reusing its connection while commitLoop is still writing to_vt.vreplicationon it)Commit_order_managerfor exactly this). RC takes no gap locks for row-image application and is MySQL's own recommendation for row-based parallel replicas; statement-based events force-serialize, so RC cannot change their outcome.3. commitLoop (single goroutine)
The commit loop receives completed transactions from workers and commits them in strict order (by
orderfield). This guarantees that the position saved to_vt.vreplicationis monotonically increasing.Responsibilities:
pendingmap andnextOrdercounterclient/query/commit) WITHOUT holdingserialMu, so slow MySQL commits never block the scheduleLoop; only the brief vplayer bookkeeping afterwards takesserialMumarkCommitted()to release inflight stateio.EOFwhen stop position is reached4. applyScheduler (shared state, not a goroutine)
The scheduler is a shared data structure protected by a mutex. It determines which transactions can execute concurrently based on writeset conflicts and transaction classification.
Transaction Classification
Every transaction enqueued to the scheduler is classified with these boolean flags:
forceGlobalhasCommitMetasequenceNumber != 0 || commitParent != 0in the GTID eventnoConflictReady-Check Hierarchy (
isReadyLocked)The scheduler checks readiness in this order:
noConflict→ always ready. These are position-only saves that have no data and no side effects beyond updating_vt.vreplication. They bypass all conflict checking to prevent deadlocks where a position save with an earlier order is blocked by inflight data transactions.inflightGlobal > 0→ blocked. Any inflight global transaction blocks everything.forceGlobal→ ready only when ALL inflight counters are zero (no inflight transactions of any kind).hasCommitMetawithinflightMissingMeta > 0→ blocked. Transactions with commit metadata cannot run alongside transactions lacking it (safety boundary).hasCommitMetawith non-empty writeset → ready if no inflight writeset conflicts. This is the key optimization: writeset-only conflict detection, skipping the commit-parent dependency check entirely. This allows parallelism even when the source is not usingWRITESETbased dependency tracking, which would otherwise produce a strict serial chain.hasCommitMetawith empty writeset → falls back to commit-parent ordering (commitParent <= lastCommittedSequence).No commit metadata, no writeset → treated as global (increments
inflightGlobal).No commit metadata, has writeset → checks writeset conflicts + must wait if
inflightCommitMeta > 0.Inflight Tracking
Four counters track what's currently being applied:
inflightGlobalforceGlobaltransactions + no-meta-no-writeset transactionsinflightMissingMetainflightCommitMetainflightWritesetWriteset Conflict Detection
PK-Based Keys
For each row change (INSERT, UPDATE, DELETE), the writeset extractor hashes the table name and primary key values into a
uint64using xxhash (XXH64). Conceptually the key representstableName:pk1,pk2,..., but using fixed-size hashes instead of heap-allocated strings eliminates the dominant per-transaction allocation source at high TPS. Both the before-image and after-image are hashed because an UPDATE that changes a PK value must conflict with both the old and new key.The PK column indices are stored in
TablePlan.PKIndices(a[]boolwheretrueat indeximeans columniis part of the PK). This is populated when the replicator plan is built from FIELD events.Unique-Key Keys
Unique secondary indexes make transactions on different rows order-dependent: one transaction frees a unique value and another claims it, so PK keys alone would schedule the pair in parallel and the second to apply would hit a duplicate-key error. Mirroring MySQL's WRITESET dependency tracking — which hashes every unique key, not just the PK — the writeset extractor also emits a key per hashable unique secondary index for both row images, with the index ordinal folded into the digest so different indexes on the same table occupy distinct key spaces. A NULL in any key column emits no key for that image (MySQL unique indexes permit multiple NULLs, so a NULL-valued key cannot conflict with anything). Only transactions actually colliding on a unique value serialize against each other; everything else stays parallel.
FK-Aware Keys
When foreign key constraints exist, a transaction that modifies a child table row must serialize with transactions that modify the referenced parent row. At startup, the parallel applier queries
information_schema.KEY_COLUMN_USAGEto discover all FK constraints. For each child row change, it hashes the parent table name and FK column values into auint64— using the same xxhash scheme as PK keys, so the hash will match the parent table's PK-based writeset hash. This forces the scheduler to see a conflict between child and parent operations.Copy-on-Write Table Plan Snapshot
The scheduleLoop needs to read
tablePlansto build writesets, but the serial applier path mutatestablePlanswhen FIELD events arrive. To avoid holding a read lock across writeset computation, we use a copy-on-write snapshot:snapshotTablePlanscopies the map only whentablePlansVersionhas changed since the last snapshot. FIELD events increment the version.Serialization Escapes (fail closed)
Whenever the writeset hasher cannot prove the absence of a conflict, the transaction is routed to the serial path (forceGlobal) rather than guessed at:
DataColumns/JsonPartialValues, noblob-1lengths on relevant columns): omitted columns can shift values into wrong field slots, so they serialize.information_schemaand a per-table stale-plan barrier force-serializes transactions touching affected tables until their refreshed FIELD event replaces the cached plan.Transaction Batching
The schedule loop merges consecutive transactions in the same relay log fetch into a single larger transaction, mirroring the serial applier's
hasAnotherCommitlookahead. This reduces the number of MySQL COMMITs.Batching rules:
time_updatedand lag metrics freshWhen batching merges a transaction, its
sequenceNumberis advanced viascheduler.advanceCommittedSequence()so that future transactions whosecommitParentreferences the merged-away sequence are not blocked forever.Empty Transaction Handling
Empty transactions (GTID → BEGIN → COMMIT with no row events) are very common (from filtered-out tables on the same source shard). They bypass the scheduler entirely:
vp.unsavedEventidleTimeout(1 second) passes with no real commits, the position is saved as anoConflictcommit-only transaction viaenqueueCommitOnlylastCommittedSequenceis advanced for the empty transaction's sequence number, preventing dependent transactions from being blockedDuring catch-up, empty transactions can arrive in a continuous stream, preventing the idle timeout from firing. A separate
lastHeartbeatRefreshtimer periodically updatestime_updateddirectly via SQL to keepmax_v_replication_lagfresh.Commit Ordering
Commits must be strictly ordered because
_vt.vreplicationstores a single position, and that position must only move forward. The commitLoop achieves this with:ordernumber (assigned by scheduleLoop)nextOrderstarting at 1commitCh; they're buffered in apendingmapnextOrderis incremented after each commitmarkCommitted()releases the transaction's inflight state in the scheduler, potentially unblocking waiting transactions (a released multi-key writeset can ready several pending transactions; dispatched workers pass the wakeup baton so all of them start without waiting for the next commit)End-to-end backpressure: the scheduler caps outstanding ordered work at roughly one applying transaction per worker plus the commit buffer (
maxOutstandingOrders≈ 5× workers), so a commitLoop stalled on an early order bounds how far the pipeline can run ahead of durable progress.Worker Transaction Commit Protocol
When committing a worker's transaction, the commitLoop (without holding
serialMufor any of the MySQL work):updatePosWithoutStop— inside the worker's open MySQL transaction (in batch mode this rides the same multi-statement flush as the COMMIT)setStopPositionStateImmediateon the same worker connection, inside the same transaction — so position, row changes, and stop state commit atomically and no cross-connection lock ordering existsserialMuto update vplayer bookkeeping (recordPositionSave, pending FIELD-refresh counters)markCommitted()to release the transaction's inflight state in the schedulertxn.done(a buffered channel) to let the worker reuse the connection; returnsio.EOFif the stop position was reachedStartup and Configuration
Flag:
--vreplication-parallel-replication-workers Nvreplication-parallel-replication-workersconfig override in the workflow'soptionscolumnActivation path:
vreplicator.replicate()creates avplayerand callsvp.play()→vp.fetchAndApply()fetchAndApplychecksParallelReplicationWorkers > 1andlen(copyState) == 0vp.applyEventsParallel()instead ofvp.applyEvents()applyWorkerinstances, each with its own filteredvdbClientMySQL connectioninformation_schemaat startupParallel apply is only active during the replication (running) phase, not during the copy phase. During copy, the serial applier is always used.
Shutdown Protocol
applyEventsParallelorchestrates a clean shutdown sequence:scheduleLoopreturns (either from error, context cancellation, or relay log EOF)scheduler.close()is called — broadcasts to wake all blocked workerswg.Wait()— waits for all worker goroutines to exitclose(commitCh)— signals commitLoop that no more transactions will arrive<-commitDone— waits for commitLoop to drain remaining buffered transactionsapplyErr(from commitLoop/scheduleLoop) takes priority overworkerErrio.EOFandcontext.Canceledare converted tonil(the caller treats nil as a clean stop)Relationship to MySQL Multi-Threaded Applier
MySQL's multi-threaded replica applier (MTA) has three dependency tracking modes:
COMMIT_ORDERWRITESETbinlog_transaction_dependency_tracking.WRITESET_SESSIONWRITESET, but transactions from the same session are additionally serialized.The Vitess parallel applier most closely resembles
WRITESETmode, but with key differences:Writesets are computed on the target, not taken from the source binlog. This means parallelism is available regardless of the source's
binlog_transaction_dependency_trackingsetting.Commit metadata is used as a fallback, not as the primary mechanism. When a writeset can be computed (non-empty PK indices, no errors), writeset-only conflict detection is used. The
commitParentfield is only used when the writeset is empty (build failure or no row events).FK awareness is built in. MySQL's MTA does not consider FK constraints in its writeset tracking. The Vitess parallel applier queries
information_schema.KEY_COLUMN_USAGEat startup and generates additional writeset keys that create conflicts between child and parent table operations.Commit ordering is strict. Unlike MySQL's MTA which can commit out of order in
WRITESETmode (and reorder via theslave_preserve_commit_ordersetting), the Vitess parallel applier always commits in order. This simplifies position tracking (single position in_vt.vreplication) and avoids gaps in the position that could confuseWaitForPosor other external observers.Key Design Trade-offs
Strict commit ordering vs. throughput
Committing in strict order means a slow transaction can stall all commits behind it, even if their MySQL work is done. We accept this because:
WaitForPosand monitoring tools expect a single consistent positionUPDATE _vt.vreplication+COMMIT); the bottleneck is the workers' apply timeWriteset-only detection vs. commit-parent chains
By ignoring the source's commit-parent chain when a valid writeset exists, we achieve parallelism even when the source uses
COMMIT_ORDER. The trade-off is that we must build writesets ourselves, which adds CPU overhead in the scheduleLoop. In practice, writeset computation is cheap (xxhash digests of PK values with no heap allocations).FK batching trade-off
Without FK constraints, transaction batching reduces MySQL COMMIT overhead. With FK constraints, batching merges independent parent/child operations into single large writesets that always conflict, destroying parallelism. The solution is to skip batching when FK refs are present, accepting more frequent COMMITs in exchange for actual parallelism.
Head-of-line blocking in the pending queue
popReadyLockedstops scanning at the first non-ready, non-noConflicttransaction. This prevents dispatching a later transaction whose inflight state could block the earlier one from ever becoming ready — which would deadlock with the commitLoop's strict ordering. The trade-off is that a single blocked transaction at the head of the queue stalls all transactions behind it, even non-conflicting ones. This is a correctness-over-throughput choice.Per-worker vplayer view
Each worker builds a narrow vplayer value once at startup via
workerLocalVPlayer, containing only the fields workers are allowed to share (tablePlans + its mutex/version, replicatorPlan, serialMu, etc). This gives each worker its owndbClient/query/commitbindings without fine-grained locking, while structurally preventing workers from reaching into main-goroutine-owned vplayer state. The shared mutable fields on vplayer are pointers (*sync.Mutex,*sync.RWMutex,*atomic.Int64,*atomic.Pointer) so the view and the orchestrator's vplayer alias the same synchronization state; per-session state like foreign_key_checks lives on thevdbClient(per connection) instead of the vplayer. Note this makes "vplayer is safely copyable" a load-bearing property: new mutable non-pointer fields must not be added without considering the worker view.Source Files
parallel_apply.goapplyEventsParallel,scheduleLoop,scheduleItems,workerLoop,commitLoop, sync.Pool, lag computation, post-DDL stale-plan barriersparallel_apply_scheduler.goapplyScheduler:enqueue,nextReady,isReadyLocked, inflight tracking, pending queue managementparallel_apply_worker.goapplyWorker: connection setup,applyEvent(field swapping on vplayer copy)parallel_apply_writeset.gobuildTxnWriteset,writesetKeysForChange,writesetKeysForFKRef,snapshotTablePlans,queryFKRefsBenchmarks
Benchmark Suite
A local benchmark suite is included in
examples/benchmark/to measure parallel applier throughput in isolation. The suite consists of:bench_setup.shPARALLEL_WORKERSbench_run.shbench_generate_load.shbench_compare.shcreate_bench_schema.sqlbench_orders,bench_events,bench_accounts,bench_logs)vschema_bench.jsonMethodology
The benchmark uses a pause-load-resume approach to isolate applier throughput from vstreamer rate:
commerce→customerand let the copy phase completeTiming is GTID-based, not lag-based. The benchmark captures the source's GTID sequence number after generating the backlog, then polls the target's GTID sequence until it reaches or exceeds the source. This provides an exact measurement of when all backlog transactions have been applied.
Configuration
The benchmark tunes MySQL to remove fsync as a variable and isolate the applier:
innodb_buffer_pool_chunk_size=1Mto allow sub-128MB sizing)innodb_flush_log_at_trx_commit=0,sync_binlog=0on all tabletsinnodb_change_buffering=noneon target tablets (forces immediate B-tree page reads on every DML)durability-policy=none(no semi-sync)Results
Row count validation: PASS (all 4 tables match source ↔ target).
The numbers above reflect a representative single A/B run on the same hardware. Across 10+ iterations of the same benchmark, individual drain times vary (parallel: 283–330s, 606–706 ops/sec; serial: 419–521s, 383–477 ops/sec), with single-run speedup ratios ranging from ~1.4x to 1.7x. The parallel applier consistently hits 600+ ops/sec and 280–330s drain; serial throughput is more sensitive to OS-level page cache and disk I/O variance between runs, which is the dominant source of speedup-ratio variance.
Key Findings
The relay log size is critical. Large relay logs (250MB / 500K items) let the serial applier build massive mega-transactions — merging all ~194K source transactions into a single MySQL transaction with one COMMIT. This amortizes per-commit overhead so effectively that the serial applier outperforms the parallel applier. Default relay log sizes (250KB / 5000 items) limit serial batches to ~200 source transactions per commit, which exposes the applier bottleneck and allows parallelism to help.
Buffer pool sizing matters for parallel workers. With an 8MB buffer pool and 4 workers, each worker effectively has 2MB of InnoDB buffer pool, causing destructive cache thrashing between workers. Sizing the buffer pool so each worker gets at least as much cache as the serial applier gets total (32MB / 4 = 8MB per worker) eliminates this problem.
FIELD event handling affects parallelism. FIELD events (table metadata) previously fell through to the
defaultcase inscheduleItems, settingcurRowOnly=falseand causing ~50% of transactions to beforceGlobal=true— effectively serializing the scheduler. Adding an explicit FIELD event handler reducedforceGlobalto ~1.8% (only at startup when table plans haven't been applied yet).Transaction batching (maxBatchedCommits) reduces per-commit overhead. Setting
maxBatchedCommits = workerCount * 4(=16 for 4 workers) merges multiple source transactions into each mega-transaction, reducing the number of MySQL COMMITs + position updates + scheduler dispatches by 16x while still producing enough independent mega-transactions to keep all workers busy.Related Issue(s)
Checklist
AI Disclosure
I worked with OpenCode + Codex-5.2 and Opus 4.6 + Copilot on this.